6 research outputs found

    Compressed Subsequence Matching and Packed Tree Coloring

    Get PDF
    We present a new algorithm for subsequence matching in grammar compressed strings. Given a grammar of size nn compressing a string of size NN and a pattern string of size mm over an alphabet of size σ\sigma, our algorithm uses O(n+nσw)O(n+\frac{n\sigma}{w}) space and O(n+nσw+mlog⁥Nlog⁥w⋅occ)O(n+\frac{n\sigma}{w}+m\log N\log w\cdot occ) or O(n+nσwlog⁥w+mlog⁥N⋅occ)O(n+\frac{n\sigma}{w}\log w+m\log N\cdot occ) time. Here ww is the word size and occocc is the number of occurrences of the pattern. Our algorithm uses less space than previous algorithms and is also faster for occ=o(nlog⁥N)occ=o(\frac{n}{\log N}) occurrences. The algorithm uses a new data structure that allows us to efficiently find the next occurrence of a given character after a given position in a compressed string. This data structure in turn is based on a new data structure for the tree color problem, where the node colors are packed in bit strings.Comment: To appear at CPM '1

    On the size of DASG for multiple texts

    No full text
    We present a left-to-right algorithm building the automaton accepting all subsequences of a given set of strings. We prove that the number of states of this automaton can be quadratic if built on at least two texts

    The minimum dawg for all suffixes of a string and its applications

    No full text
    Abstract. For a string w over an alphabet ÎŁ, we consider a composite data structure called the all-suffixes directed acyclic word graph (ASDAWG). ASDAWG(w) has |w | + 1 initial nodes, and the dag induced by all reachable nodes from the k-th initial node conforms with DAWG(w[k:]), where w[k:] denotes the k-th suffix of w. We prove that the size of the minimum ASDAWG(w) (MASDAWG(w)) is Θ(|w|) for |ÎŁ | = 1, and is Θ(|w | 2) for |ÎŁ | ≄ 2. Moreover, we introduce an on-line algorithm which directly constructs MASDAWG(w) for given w, whose running time is linear with respect to its size. We also demonstrate some application problems, beginning-sensitive pattern matching, regionsensitive pattern matching, and VLDC-pattern matching, for which AS-DAWGs are useful.

    S.: Discovering best variable-length-don’t-care patterns

    No full text
    Abstract. A variable-length-don’t-care pattern (VLDC pattern) is an element of set Π =(ÎŁ âˆȘ{⋆}) ∗ , where ÎŁ is an alphabet and ⋆ is a wildcard matching any string in ÎŁ ∗. Given two sets of strings, we consider the problem of finding the VLDC pattern that is the most common to one, and the least common to the other. We present a practical algorithm to find such best VLDC patterns exactly, powerfully sped up by pruning heuristics. We introduce two versions of our algorithm: one employs a pattern matching machine (PMM) whereas the other does an index structure called the Wildcard Directed Acyclic Word Graph (WDAWG). In addition, we consider a more generalized problem of finding the best pair 〈q, kâŒȘ, where k is the window size that specifies the length of an occurrence of the VLDC pattern q matching a string w. We present three algorithms solving this problem with pruning heuristics, using the dynamic programming (DP), PMMs and WDAWGs, respectively. Although the two problems are NP-hard, we experimentally show that our algorithms run remarkably fast.
    corecore